Conversation
2489ce4 to
a3e7e3c
Compare
Major changes to convertGVFtoVCF: - removed compare_and_merge lines and placed functions into VCFlines file Major changes to VCF line: - improved single responsibility principle - added many functions. Function moved from VCFlines to Utils.py (build_iupac_ambiguitycode) New files added: - Lookup.py for referencing - New test files: test_assisting_converter, test_logger, test_utils, test_vcfline (separated out tests to match the file structure) Optimised the function: read_in_gvf_file by removing extraneous for loop
convert_gvf_to_vcf/vcfline.py
Outdated
|
|
||
| self.vcf_value, self.info_string, self.format_dict = convert_gvf_attributes_to_vcf_values(gvf_feature_line_object.attributes, mapping_attribute_dict, field_lines_dictionary, all_possible_lines_dictionary) | ||
| (self.vcf_value, # used to populate the VCF fields. This is a dict of non-converted GVF attribute keys and their values. | ||
| self.info_string, # a formatted INFO string to form VCF line |
There was a problem hiding this comment.
You should return and store a info_dict rather than an info_string and leave the formatting the the __str__ function
There was a problem hiding this comment.
addressed: strings have been removed
convert_gvf_to_vcf/vcfline.py
Outdated
| @@ -67,10 +74,11 @@ def __init__(self, | |||
There was a problem hiding this comment.
This should use the order_format_keys function although I don't thin you should construct the format string here. It should be reserved to the __str__
There was a problem hiding this comment.
addressed: used order_format_keys and removed strings
convert_gvf_to_vcf/vcfline.py
Outdated
| self.format, | ||
| self.format_values_by_sample_string |
There was a problem hiding this comment.
You should not be using stored strings. The should be constructed in this function from the list of format tag and the format_dict.
There was a problem hiding this comment.
addressed: removed strings
convert_gvf_to_vcf/vcfline.py
Outdated
| self.pos, | ||
| self.merge_and_add(self.id, other_vcf_line.id, ";"), | ||
| self.ref, | ||
| self.merge_and_add(self.alt, other_vcf_line.alt, ","), |
There was a problem hiding this comment.
What happens if you merge 3 lines. First contains a <DEL>, second contains a <INS> and third contains a <DEL>?
merge('<DEL>', '<INS>') --> '<DEL>,<INS>'
merge('<DEL>,<INS>', '<DEL>') --> '<DEL>,<INS>,<DEL>'
There was a problem hiding this comment.
not addressed yet
| # The functions below relate to the VCF objects | ||
| def compare_vcf_objects(list_of_vcf_objects): | ||
| """ Compares VCF objects in the list with the VCF object before it. Returns boolean values. | ||
| :params: list_of_vcf_objects: list of vcf objects | ||
| :return: comparison_results: list of booleans. For future reference, if True, this will determine merging lines; if False, this will determine use of the previous line. | ||
| """ | ||
| sample_format_value_tokens = [] | ||
| for sample in list_of_sample_names: | ||
| if sample in sample_name_dict_format_kv: | ||
| format_value = sample_name_dict_format_kv[sample] | ||
| sample_format_value_tokens.append(':'.join(format_value.values())) | ||
| comparison_results = [] | ||
| # For each vcf line object, compare with the previous vcf line object in the list | ||
| for index in range(1, len(list_of_vcf_objects)): | ||
| current_vcf_object = list_of_vcf_objects[index] | ||
| previous_vcf_object = list_of_vcf_objects[index - 1] | ||
| # Determines the VCF line objects as equal based on the CHROM, POS and REF being the same (__eq__ in Vcfline) | ||
| if current_vcf_object == previous_vcf_object: | ||
| comparison_results.append(True) # This will use require merging. | ||
| else: | ||
| format_value = "." # set to missing value | ||
| sample_format_value_tokens.append(format_value) | ||
| sample_format_values_string = '\t'.join(sample_format_value_tokens) | ||
| return sample_format_values_string | ||
| comparison_results.append(False) # No merging required. Use previous object. | ||
| return comparison_results |
There was a problem hiding this comment.
Calculating the comparison and storing the results seem overly complicated and there are function that exists in python to do what you want.
You should look for itertools.groupby that will probably help
There was a problem hiding this comment.
not addressed yet
| def merge_vcf_objects(previous, current, list_of_sample_names): | ||
| """ Merge VCF objects. | ||
| :params: previous: previous VCF line object | ||
| :params: current: current VCF line object | ||
| :params: list_of_sample_names: sample names | ||
| :return: merged_object | ||
| """ | ||
| merged_object = previous.merge(current, list_of_sample_names) | ||
| return merged_object | ||
|
|
||
| def keep_vcf_objects(previous, list_of_sample_names): | ||
| """ Keep VCF objects. | ||
| :params: previous VCF line object | ||
| :return: kept_object | ||
| """ | ||
| kept_object = previous.keep(list_of_sample_names) | ||
| return kept_object |
There was a problem hiding this comment.
These two functions seems unnecessary.
There was a problem hiding this comment.
not addressed yet
convert_gvf_to_vcf/vcfline.py
Outdated
| ) | ||
|
|
||
| def keep(self, list_of_sample_names): | ||
| self.format_values_by_sample_string = self.combine_format_values_by_sample(self.format_dict, list_of_sample_names) |
There was a problem hiding this comment.
This should not be assigned to self.
There was a problem hiding this comment.
not addressed yet
| return self.info_string | ||
|
|
||
| # MERGE OR KEEP below | ||
| def merge(self, other_vcf_line, list_of_sample_names): |
There was a problem hiding this comment.
The merge and keep functions are returning a set of strings ready to be printed where I think you should keep the information in the VcfLine object until the end and print them using the __str__
There was a problem hiding this comment.
addressed: return self and formatted with str
| self.format_values_by_sample_string | ||
| ) | ||
|
|
||
| def keep(self, list_of_sample_names): |
There was a problem hiding this comment.
The keep function does not do much except updating the list of samples in the VCF line. I think you should change the name of the function to describe what it actually does.
There was a problem hiding this comment.
not addressed yet
…__str__ to write the vcf line object
No description provided.